#RL scaling12/08/2025
ProRLv2: NVIDIA Extends Reinforcement Learning to Unlock Deeper LLM Reasoning
ProRLv2 scales RL training to 3,000 steps and combines regularization and exploration techniques to expand reasoning capabilities in compact LLMs, showing strong benchmark gains across math, coding, logic and STEM tasks.